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Abstract This paper addresses the numerical aspects of the maximum like- 
lihood estimation of linear discrete-time dynamic stochastic systems. The 
gradient-search optimization methods require the computation of the likeli- 
hood function, its gradient (known as a "score") and the Fisher information 
matrix (FIM). This leads to implementation of the Kalman filter (and its 
derivatives with respect to unknown system parameters), which is known to 
be numerically unstable. The array square-root (ASR) algorithms are the most 
preferable now for practical implementation of the Kalman filter. In this paper, 
we develop an elegant and simple method for computation of the derivatives 
of the filter variables to unknown system parameters required in a variety of 
applications. The new technique easily extends functionality of any ASR filter 
and leads to a stable score and FIM evaluation. Furthermore, the method is 
ideal for simultaneous state estimation and parameter identification since all 
values are computed in parallel. Finally, the new approach replaces the stan- 
dard technique based on direct differentiation of the Kalman filtering equations 
(with their inherent numerical instability) and, hence, improves the robustness 
of computations against roundoff errors. 
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1 Introduction 



The method of maximum likelihood is a general method for parameter estima- 
tion and is often used in system identification. To implement it, it is necessary 
to maximize the likelihood function, which is usually done using the gradient 
approach. It involves the computation of the likelihood function and its gra- 
dient (with respect to unknown system parameters) known as a "score" . For 
dynamic stochastic systems this task leads to implementation of the Kalman 
filter (KF). Additionally, it requires determination of sensitivities of the sys - 
tem state to unknown parameters as explained in I Gupta and Mehral (|l974h . 



Other applications with s imilar require ments arise, for instance, in the field of 



optimal input design; see iMehral (]1974T ) etc 



Since the appearance of the KF in 1960s, the method has been extensively 
used in practice playing a central role in estimation. However, a serious limita- 
tion of the KF is a problem of its numerical instability with respect to round off 
errors. As a result, there has been also great practical interest in the develop- 
ment of numerically stable and computationally efficient KF implementation 
methods. To deal with the problem of the numerical instability, a number 
of square-root (SR) algorit hms have been derived for the KF and smoothing 
formu la s; see, for instance, Biermanl dl977l); Kaminski et all (1 19 71 ^_ Morf et al 



1971 : ISaved and KailathI (Il994 : IPark and KailathI (|l995allbl . Il99fih : iGibbs 



20111 ) and many others. Current implementations of the KF are most often 
expressed in (what is called) an array form. Such methods imply utilization 
of numerically stable orthogonal transformations for each recursion step. This 
feature enables more efficient parallel implementation and leads to algorithms 
with better numerical stability and conditioning properties; see ( Kailath et all . 



20001 Chapter 12) for an extended explanation. 



Despite the existing diversity of the efficient KF implementation methods, 
the computation of the sensitivities of the system state to unknown parame- 
ters in terms of numerically favored filtering algorithms is seldom addressed. 
In this paper we design simple and elegant computational schemes that al- 
low for a natural extension of any ASR KF algorithm on the case of sys- 
tem state sensitivity computation required in a variety of applications, e.g. in 
the maximum likelihood estimation. Our approach avoids implementation of 
the conventional KF (and its direct differentiation with respect to unknown 
system parameters) because of its inherent numerical instability and, hence, 
leads to more stable algorit hms. The first paper on a stable score evaluation 
in terms of ASR filters is bv lBierman et al ( 1990t) . The fil ter covere d there be- 



longs to the class of info r matio n-type algorithms. Then, Kulikova (|2009al lbh 



Tsveanova and Kulikoval (l2012h proposed a stable method for score evalua- 



tion in the covariance-type ASR filter. The new result presented here unifies 
all outcomes of the cited papers and develops a stable score evaluation mecha- 
nism that works in any ASR filters. Additionally, we treat here the case, which 
has never been covered earlier, but constitutes an importa nt class of the KF 
methods, e.g. the class of the information-type filters from Park and KailathI 
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(|l995aT ) . The numerical examples are given to demonstrate the effectiveness of 
the algorithms and the main results obtained. 

The paper is organized as follows. In Section 2 we discuss the maximum 
likelihood estimation procedure of linear discrete-time dynamic stochastic sys- 
tems. The class of ASR filters is briefly described in Section 3. Section 4 
presents the new theoretical results for sensitivities computation used for 
a score evaluation (the corresponding MatLab codes are presented in Ap- 
pendix A for readers' convenience). In Section 5 we show how to use the new 
theoretical results in order to construct numerically stable score evaluation al- 
gorithms. The Fisher information matrix is computed at the same way. Some 
numerical results and the comparison with the conventional KF approach are 
given in Section 6. Finally, Section 7 concludes the paper. 



2 Maximum likelihood estimation of state-space models 

Consider the discrete-time linear stochastic system 

x k+1 = F(9)x k + G(9)w k , k = 0,...,N, 
z k = H(9)x k + v k 



(1) 
(2) 



where Xk € K™ and z k S R m are, respectively, the unknown state and the 
available measurement vectors; A: is a discrete time, i.e. Xk means x(t k ). The 
process noise, {w k }, and the measurement noise, {v k }, are Gaussian white- 
noise processes, with covariance matrices Q(9) > and R{9) > 0, respectively. 
All random variables have known mean values, which we can take without loss 
of generality to be zero. The initial state xq ~ M(0, IIo(9)) and 













Vk 









Wj Vj Xq 1 



Q{9) 
R{9) 







, 
hj 

n {9) o 



(3) 



where Skj denotes the Kronecker delta function. 

Application of the KF assumes complete a priori knowledge of the state- 
space model parameters, that is a rare case in practice. Hence, we assume that 
system (P) , ^ is parameterized by a vector of unkno wn system parameters 9 £ 
R p , w hich needs to be estimated. As mentioned in ISarkka, and Nummenmaa 
the classical way of solving the problem of uncertain parameter is to use 
adaptive filters where the model parameters are estimated together with the 
dynamic state. For instance, the method of maximum likelihood is a general 
method for parameter estimation and can be used to construct the adaptive 
filter. Let us discuss such adaptive computational schemes in details. 

Solving the problem of parameter estimation by the method of maximum 
likelihood req uires the m a ximiz ation of the log Likelihood Function (LF) given 
as follows; see ISchwennel (Il965h : 



Co 



Nm 



1 N 

ln(27r) - - ]T {in (det R e , k ) + e T k R~le k } 



(4) 



fe=i 
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where Z± = {zi, . . . , z^} is iV-step measurement history and the innovations, 
{e/c}, e ~ Af (Q,R e ,k), are generated by the discrete-time KF: 

x k+ i\ k = Fx k \ k _ x + K p , k e k , with e k = z k - Hx k \ k _x, £ |-i = 0, (5) 

K p , k = K k R-l K k = FP k]k _ 1 H T 1 R e>k =R + HP k]k _ 1 H T (6) 

where the error covariance matrix fW_i = E Hx k — x k \ k -\){x k — x k \ k _i) T } 
satisfies the difference Riccati equation 

P k +i\k = FP k]k _ x F T + GQG T - K Ptk R e<k K£ k , P 0M = 77 > 0. (7) 

We need to stress that the model is parameterized by a vector of unknown 
system parameters 9 and, hence, all the entries of the matrices F 6 M. nxn , 
G £ R nxq , H e R roxn , Q € M« x «, R G R mxm and iT e R nxn are functions 
of 9. For the sake of simplicity we suppress the corresponding notations above 
and below, i.e instead of F(9), G(6), H{&) etc. we will write F, G, H etc. 

The maximization of the log LF is often done by using a gradient ap- 
proach or Newton's type methods where the computation of the Likelihood 
Gradient (LG) and the Fisher Information Matrix (FIM) is necessary. In this 
manuscript, we do not discuss the particular optimization method that can be 
applied in each particular situation, but explain how the log LF, its gradient 
and FIM can be computed in parallel with the system state estimate by means 
of some extension of the ASR based KF algorithms. Such methods are ideal 
for simultaneous state estimation and parameter identification. Besides, they 
avoids implementation of the conventional KF with its inherent numerical in- 
stability. Other computational aspects of the maximum likelihood estimation 
a nd different gradient-base d nonlinear programming methods are discussed 
in lGupta and Mehra ( 1974 ) at large. 



Assu me that we chose som e optimization method. For instance, as men- 
tioned in Bierman et all ( 199Clh the scoring equation could be used in practice 



^) = |W-(^ (P) )- 1 ^| J 



(8) 



where J- is the FIM, dCg/d9 is the gradient vector (score) and 9^ denotes 
the value of 9 after r iterations of algorithm ([8]). 

Below we explain how the next cycle for computing can be obtained 

by using the scoring equation (J5J and the current approximation 9^ r \ Given 
the current value 9^ r \ we need to compute log LF which can be also 
written as follows: 

Co (Z?) = H2n) - \ J2 {2 In (det < fe 2 ) + efo } (9) 

fc=i 

where e k = R ^ 2 e k are the normalized innovations and R X J k is a square- 
root factor of i? e ,fc- Throughout the paper we use the Cholesky decomposition 
of the form A = A T ^ 2 A 1 ^ 2 , where A 1 ! 2 is an upper triangular matrix with 
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positive diagonal entries. We also introduce the notation A 1//2 

A T/2 = and A -T/2 = (^-1/2)T_ 

The log LG can be computed as shown in iKulikoval (|2009al ). i.e. 



{A 1 / 2 ) 



3Cg [Z{ 

36, 



N \ 



N 



£ tr 



fc=i 



r: 



BE 1 ' 2 

-1/2 01t e.k 



where tr[-] denote s the trace of matrice s. 

As discussed in Bierman et all ( 1990h , entries of the FIM obey the following 
equation: 



N 



fc=i 



^ k 36, 



trE 



3e k del 
39, d8j _ 



(11) 



where i,j = l,...,p and Tij denotes the (i, j)-th element of the matrix. The 
expectation, E{-}, is taken over the whole sample space . As mentioned 
in the cited paper, equation (fTTj) could be also used by replacing the expected 
values with sample values, as it is usually done in practice. 

Thus, the computation of the log LF leads to implementation of the discrete- 
time KF. The calculation of the score and FIM, in turn, requires determination 
of the derivatives of the filter variables (with respect to unknown parameters). 
This leads to a set of p vector equations, for computing dx^ik-i/dOi, known as 
the filter sensitivity equations, and a set of p matrix equations, for computing 
<9Pfc|fc_i/i9#i, known as the Riccati-type sensitivity equations. 



3 Array square-root KF algorithms 

Algorithm ([5]) - (J7J is called the conventional KF. The matrix Pk\k-i ap- 
pearing in ([7]) has the physical meaning of being the variance of the state 
prediction er ror, x k — , and therefore has to be nonnegative-definite. As 

mentioned in Hassibi et all (|2000l ) , round off errors may destroy this property, 



thus, throwing all the obtained results into doubt. Since the appearance of 
the KF in 1960s, there has been great practical interest in the development of 
numerically stable and computationally efficient KF implementation methods. 

ASR algorithms are the most preferable now for practical implementation 
of the KF. In contrast to the conventional KF, they propagate only square- 

1/2 

root factors Pf.L_i of the covariance matrices Pk\k-i> k = 0, . . . , N. The point 

A A 7/2 * 1 /2 

is that the product of the computed factors, say Pk\k-i — Pk/k-l^klk-v ^ s 
a symmetric matrix with positive elements on t he dia gonal and it is almost 
certainly nonnegative-definite; see iKailath et al (2000) for more details. Fur- 



thermore, any ASR filter uses a numerically stable orthogonal rotation at each 

1/2 1 /2 

iteration step to update P k \ k _ 1 , i-e. to find next P k ^ k - Let us consider a few 
examples. 
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To illustrate the new theoretica l approach, we dea l with th e covariance- and 
information- type ASR filters from Park and Kailathl (1995a). The covariance- 



1/2 

type filter (the eSRCF) is given as follows: set the initial values P Q /_ 1 
0) Pq\-i x o\-i = n Q xq and then recursively updatqj 



nl /2 > 



R 1 / 2 

p,y, 2 ,h t p},{ 2 v 

k\k— 1 k\k — 1 

Q 1 l 2 G T 



F 1 



-R- T/2 z k ' 




P 1/2 , 
u ^fe+llfe 


-efe 




p-T/2 f 
*k\k-l X k\k-l 




p -T/2 f 


(12 










Ik 





where Qfe is any orthogonal rotation that upper-triangularizes the first two 
(block) columns of the matrix on the left-hand side of (fT2"|) and K p _k = 



jr r>T/2 - 
K P;kR e ,k > ^ 



R ^ 2 £k are the normalized innovations. 



The information-type filters work with the inverse of the error covariance 
matrix fW_i. Such methods are of particular interest when there is a very 
poor or little informati on on the initial value of th e state estimate. We consider 
the eSRIF developed in lPark and Kailathl (1995a) that is given in the following 

o T ^ 2 xq = P Q T ^ 2 xq, calculate 



— T/2 - 

form: given P Q and P n 



R-T/2 _ R -T/2 H p-l R-T/2 HF -l GQ T/2 

o p-;L\f-- -p^LIf-'gqti* 

/ 



-R- T,2 z k 

p-T/2 » 
^k\k-l X k\k-l 



R 



-T/2 



k 
T/2 
k+l\k A 

* 



Pi, i i k P, 





~ T/2 

k+l\k u 



-efe 

p -T/2 f 
"k+l\k Xk + 1 \ k 



(13) 



where Qk is any orthogonal transformation such that the first three (block) 
columns of the matrix on the right-hand side of formula (|13[) is block lower 
triangular matrix. 



Remark 1 The predicted estimate aS/s+iifc can now be computed from eSRCF JT 
as follows 

(14) 



Xk+l\k 



( pT/2 \ /p-T/2. 
\ r k+l\k J yk+l\k X k+l\k 



We note that the parenthesis are used to indicate the quantities that can be 
directly read off from the right-hand side matrix in (TT21 . Hence, no matrices 
need to be inverting in calculating the state vector estimate. 

Remark 2 In eSRIF (|13p . the estimate x^.+i\k can be found by solving the 
triangular system 



( P fc+ T l|fe) ( X k+l\k) - (P k +l\k X k+l\k 



(15) 



As an above, the parenthesis are used to indicate the quantities that can be 
directly read off from (fl3|) . 



1 The algorithm can be verified by "squaring" both sides of J 1 21) . using the fact that 
QfeQfc = I> an d comparin g the entries o f both sides of the result. The detailed derivation 
can be also found in lKailath et all 1120001 ). 
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Remark 3 Apart from numerical advantages, this class of niters seems to be 
better suited to parallel implementations since the state vector can be found 
as a product of the quantities that are readily available from the right-hand 
side matrices in (IT^I) . (fT5|) . 

As mentioned earlier, the maximum likelihood estimation procedure leads 
to implementation of the KF (and its derivatives with respect to unknown 
system parameters), which is known to be numerically unstable. It is desirable 
to avoid the use of the conventional KF in the computational scheme. In other 
words, we would like to replace the disadvantageous conventional KF by nu- 
merically stable ASR filters, e.g. by the eSRCF and/or eSRIF presented above. 
In connection with our goal, we design a general computational approach that 
allows to update the filtering equations together with the sensitivity equations 
in terms of any numerically stable ASR filter. 



4 ASR Filter Derivatives Computation 

In this section, we present a simple and convenient technique for computing 
derivatives of the ASR filter variables. First of all we note that each iteration 
of any ASR filtering and/or smoothing algorithm has the following form: 



QA = R 



(16) 



where A is a rectangular matrix, and Q is any orthogonal transformation 
matrix that when multiplied by A gives either a block upper or lower triangular 
matrix R. The matrix A is usually called the pre-array and R is the post-array. 

Next, we treat the upper and lower triangular cases of the matrix R, sep- 
arately, and prove the following main results. 

Lemma 1 (the lower triangular case) Let entries of the pre-array A G 
R.(s+fe)x(s+;) in U6)) be known differentiable functions of a parameter 9. Con- 
sider the equation 



s 


I 




s 


I 


'A n 


A 12 ~ 


k _ 





R12 


A 2 i 


A 22 _ 


s 


R21 


R22 



(17) 



where Q 6 R( s+fc ) x ( s + fc ) is an orthogonal matrix that lower-triangularizes the 
first (block) column of the matrix on the left-hand side of J_?7| ) and R21 € M sxs 
is lower triangular. Introduce the notation 



s 


I 




s 


I 


'We 


We' 


k 


X 


N~ 


.We 


We. 


s 


Y 


V 



k . 

s 



(18) 



Then given the derivative of the pre-array A' g , the following formulas calculate 
the corresponding derivatives of the post-array blocks: 



(R2i)' e = {U T + D + L)R 21 , 



(19) 
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(R22)' e = [U T ~ U] R22 + R^X T R 12 + V (20) 

where L, D and U are respectively strictly lower triangular, diagonal and 
strictly upper triangular parts of the following matrix product YR 2 \ ■ 

Proof At first, we show that Q' g Q T is a skew symmetric matrix. For that, we 
differentiate both sides of the formula QQ T — I with respect to 9 and arrive 
at Q' g Q T + Q (Q T )' g = 0, or in the equivalent form Q' g Q T = - (Q' g Q T ) T '. The 
latter implies that the matrix Q' e Q T is skew symmetric and can be presented as 
a difference of two matrices, i.e. Q' g Q T — U T — U where U is an (s + k) x (s + k) 
strictly upper triangular matrix. Thus, the last (s x s)-block located at the 
main diagonal of Q' g Q T has the same form, i.e. 

[Q' e Q T ] sxs = Uj xs -U sxs (21) 

where U s xs is a s x s strictly upper triangular matrix and [QgQ T ] SX5 . stands 
for the (s x s)-matrix composed of the entries located at the intersections of 
the last s rows with the last s columns of the product Q' g Q T ■ 

Next, we prove that the above-mentioned matrix U sxs is, in fact, the upper 
triangular part of the matrix product YR^f. To do this, we differentiate the 
first equation in formula (|17p . i.e. 



An" 


k _ 





k 


A 2 i_ 


s 


R21 _ 


s 



Q 



with respect to 9. Then, taking into account notation (|18p and equality A 
Q T R, we obtain 





(R2l) g 



= Q'e 



An 
A 2 i 



Q 



(AnYe 
(A 2 i)' 6 



= QeQ 




R21 



(22) 



Further, it is not difficult to see that the pseudoinverse matrix (Moore-Penrose 
inversion) of R — [0 R21] is R + = [0 R21] ■ Therefore the right multiplication 
of both sides of (f2"2"j) by this matrix R + yields 



(i?2i)e R-2i 



Q'eQ T [0 



fex k 



Isxs] + 



[0 R^ 1 ] 



(23) 



where / sxs is the identity matrix of dimension s and 0fc X fe is the zero block of 
size k x k. The [Okxk © Isxs] means diagjO/cx/c, Isxs}- Now we remark that 



(24) 



"0 







"O 


\He^ \ col- last s 





(Ifal)'g R21 










+ 


"0 


XR 2 i 










where [QgQ T ] r °™ V T** ^ stands for the (k x s)-matrix composed of the entries 
located at the intersection of the first k rows with the last s columns of the 
matrix Q' g Q T . 
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Prom the matrix equation (l2"4")l . we conclude the following. First, the matrix 
on the left-hand side of ([2"4"| is block lower triangular. Thus, the strictly upper 
triangular part of the matrix [Qe»Q T ] sxs must exactly annihilate the strictly 
upper triangular part of the corresponding second term on the right-hand side 
of (pM|) . In other words, if the matrix product YR21 is represented as 



Us 



where L SXSl D sxs and U sxs are respectively the strictly lower triangular, di- 
agonal and strictly upper triangular parts, then the matrix U sxs , m fact, 
satisfies ([2~T]) . 

Now formula (fH?)) is easily justified. Indeed, from the matrix equation (|24p . 
we obtain 



(i?2i)e R21 - u > 



T 

sxs 



Ls 



Ds 



+ Usxsi 



[Q'eQ 1 
{U SX s 



Ds 



Ls 



YR~ 
3)^21- 



For the sake of simplicity, in equation (|19p we omit the subscripts of the 
matrices L, D and U. 

Second, from the matrix equation we observe that the first (block) 
row of the left-hand side matrix in is zero. Thus, the first (block) row of 
the matrix Q' S Q T must exactly cancel the corresponding block of the second 
term in (1241). i.e. we arrive at 



[Qm c zz: =-x^- (^) 

Next, we wish to validate ((20|) . By differentiating the last equation in (|17p . 



i.e. 



I 




I 




~ A 12 ~ 


k 


R12 


k 


A22 


s 


R22 


s 



Q 



with respect to 9, and taking into account notation (|18p . we derive 



(Rl2)' t 
{R22)' t 



An 
An 



Q 



(Al2)' e 

(A22)' e 



Q'eQ T 



R12 
R22 



The previous formula implies that 
{R22)' e = V 



wrz r^ 



col: first k 



[Q'eQ T ] sxs R22 

[Ujxs - Usxs] R2 



(26) 

where U sxs is the upper triangular matrix from (|2ip and [QgQ T ] ™^ j^l t ^. 
stands for the (s x fc)-matrix composed of the entries located at the intersec- 
tions of the last s rows with the first k columns of the product Q' g Q T ■ 
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Eventually, formula (125)) and the fact that Q' g Q T is skew symmetric result 



[Q'eQ 



l ^-.j 7 -] row: last s 
col: first k 



[Q'eQ 7 ] Zl. 



row: first k 
last s 



— - [-XR 2 l] — R 2 ^X T 



(27) 

Thus, the substitution of ([27)1 in ([25)) validates ([20)) and completes the 
proof of Lemma [T] 



Lemma 2 (the upper triangular case) Let entries of the pre-array A £ 
l>(s+fc)x(s+i) in U6]) be known differentiable functions of a parameter 9. Con- 
sider the equation 



s 


I 




s 


I 


"An 


Au~ 


s 




Rl2 


A 2 i 


A 2 2_ 


k = 





R22 



(28) 



where Q £ R( s + fe ) x ( ;5 +' c ) is an orthogonal matrix that produces the block zero 
entry on the right-hand side of i e 28\) and Rn G R sxs is upper triangular. 
Introduce the notation 



Q 



(An)' ( 
(4u) 



I 

(A-- 
(A. 



I 

N~ 
V 



s . 

k 



(29) 



Then given the derivative of the pre-array A' e , the following formulas calculate 
the corresponding derivatives of the post-array: 



(R u )' = (L T + D + U)R 1U 



(30) 



(R 12 y e = [L 1 - L\ R 12 + i? n J Y 1 R 22 + N (31) 

where L, D and U are respectively strictly lower triangular, diagonal and 
strictly upper triangular parts of the following matrix product XR^ . 

Proof Lemma [2] is proved at the same way as Lemma U The detail derivation 



of the formulas above can be also found in iKulikoval ([2009b). 



m 

(201 



We see that the proposed computational procedure utilizes the pre-array A, 
its derivatives with respect to the unknown system parameters, the post-array 
R and the orthogonal transformation Q in order to compute the derivatives of 
the post-array. This technique naturally extends any ASR filter on the deriva- 
tives computation (with respect to unknown systems parameters) required, for 
instance, in the maximum likelihood estimation of state-space models CE}-([2]). 
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4.1 Summary of Computations 

For readers' convenience, the entire scheme of computation is presented in 
the form of a pseudo-code. In section 5 we explain how this theoretical result 
can be applied to the eSRCF and eSRIF in order to compute the log LF, its 
gradient and the FIM. 

Step 0. Set a current value of 9^ , 

Step 1. Evaluate F = F U M , G — G U (r) , H — H U (r) etc; 



OF _ OF 

Wi ~ M 



dG_dG 
d0i ~ 86, 



dH _ dH 



etc; 



Use the Cholesky decomposition to find LIq, Q 1 ^ 2 , R 1 / 2 . 

q i it . ... , D l/2 A-l/2 - n dP o\-l dill 12 dx 

Set the initial conditions: FJ — llJ , x i_i = 0, — — ± — = — — — , — — = 0. 

U| 1 1 oOi o&i adi 

For t k , k = l,...,N, do 
Step 2. Apply a numerically stable ASR filter instead of the conventional KF: 
Step 2a. Form the pre-array A (use the data from Step 1); 

Step 2b. Compute the post-array R using QR algorithm. Save {Q, R} for Step 3. 

Step 3. Apply the designed derivative computation method (for each 6i : i = 1, . . . ,p): 

OA 

Step 3a. Form the derivatives of the pre-array — — (use the data from Step 1); 

OA 

Step 3b. Compute Qtttt (use the Q from Step 2); Save {Xi,Yi,Ni,Vi}\ 

Step 3c. If R is upper triangular, calculate Xj,R^ . Split it into Li, Di, Uf, 

lower triangular, calculate YiR^ 1 . Split it into Li, Di, Of, 

dR 

Step 3d. If R is upper triangular, compute = (LJ + Di + UARn; 

O&i 

OR 

is lower triangular, compute = (llf + Di + Lj) R^i ; 



Step 3e. If R is upper triangular, find — ^ = [Lj - L,] R 12 + R^Y? R 22 + N. t 

is lower triangular, find = \U i — C/jJ i?22 + -R21 ^12 + K'- 

End i loop and then k loop. 

We stress that our methods allow the filter and Riccati-type sensitivity 
equations to be updated in terms of stable ASR filters. 



5 Illustrative examples: ASR methods for the score and FIM 
computation 

In this section we explain how the log LF, the score and FIM can be computed 
in parallel with the system state estimate by means of some extension of the 
ASR based KF algorithms. Such methods are ideal for simultaneous state 
estimation and parameter identification. 

It is not difficult to see that log LF © can be computed directly from 
eSRCF Unj} and/or eSRIF ([Tg]). Thus, instead of the conventional KF, which 
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is known to be numerically unstable, we utilize stable ASR filters. However, to 

1 /2 

compute the score and FIM one needs to calculate dR e k fdOi and de^/dOi, i = 

1/2 _____ 

1, . . . ,p, in addition to the filter variables e& and Rj k ; see formulas (fTUj) . (fTTl) . 
It is an easy task, if we apply Lemma [I] or [2] For instance, let us first consider 
cSRCF (fT2")l . We note that the post-array is an upper triangular matrix. Hence, 
we apply Lemma [2] to equation ([1 2\ with 1 = 1, s = m + n, k = q and the 
following partitioning: 





R-21 R22 

The computational scheme presented in Section 4.1 leads to the filter derivative 

1/2 

computations and, in particular, to the dR e ' k jdQi and de^jdQi, i = 1, . . . ,p 
evaluation, which needs to be used in (ITU|) . (fTTj) . The de tailed der i vation of 
this method for the score calculation can be also found in iKulikoval (|2009afh 

At the same way the information-type algorithm can be easily obtained 
from eSRIF (fl~3| by using LemmaQ] We note that 1 = 1, s = m + n + q, k = 
and consider the following partitioning: 



R- T ' 2 -R^^HF- 1 R- T ' 2 HF- 1 GQ T ' 2 








p m-l F - 1 







-R- T l 2 



Zk 



P, 



-T/2 



k\k- 



1 





where the blocks An, Ayi and R\\, R\i are empty. The computational scheme 
presented in Section 4.1 leads to the filter derivative computations and, in 



1, . . . ,p computation. We 



particular, to the dR e ^ 2 /d9i and dek/d9. 

may also note that since the elements dR~^ 2 / dOi are directly available from 
the algorithm, it is more convenient to rewrite formula fj 10[) as follows: 



dC e (Z? ) 



89,. 



N 

E 

fe=i 



tr 



R 



1/2 

e , k 



d6i 




.p. 
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The formula above can be easily justified if we take into account that dR e k jdQ% = 

-R~l /2 (9R 1 JJ/de i )R;^ 2 ,i = i,...,p. 

Remark 4 In the KF formulation for the computation of sensitivities of the 
system state to unknown parameters, it is necessary to run the "differentiated" 
KF for each of the parameters 9t to be estimated. In the proposed computa- 
tional scheme this "bank" of filters is replaced with the augmented array to 
which orthogonal transformation is applied. 

Remark 5 The proposed computational scheme naturally extends any stable 
ASR filter and allows the filter and the filter sensitivity equations to be updated 
in parallel. Hence, such methods are ideal for simultaneous state estimation 
and parameter identification. 



6 Numerical Examples 

6.1 Simple Test Problem 

First, we would like to check our theoretical derivations presented in Section 4. 
To do so, we apply the proposed ASR based computational scheme to the 
following simple test problem. 



Example 1 For the given pre-array 



.4 



6» 5 /20 4 /8 3 /6 
# 4 /8 9 3 /3 9 2 /2 
9 3 /6 9 2 /2 9 



9 3 /3 
9 2 /2 
1 



(32) 



compute the post-arrays R from (|2"5)) and its derivative R' g , say, at 9 = 2 
where the first three (block) columns of the post-array R is an upper triangular 
matrix; see equation 



Since the pre-array A is fixed, we skip Steps 0, 1 in the computational 
scheme presented in Section 4.1. We note, that the unknown parameter 9 is a 
scalar value, i.e. p = 1. For simplicity, we assume that N = 1, i.e. we illustrate 
the detailed explanation of only one iteration step. Next, we note that I = 1, 
s = 3, k = and, hence, we have the following partitioning of the pre-array: 



in 



9 5 /20 9 4 /8 9 3 /6 
9 4 /8 9 3 /3 9 2 /2 
9 3 /6 9 2 /2 9 



9 3 /3 
> 2 /2 
1 



where the blocks A21, A22 are empty and, hence, R21, R22 are also empty. 
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Table 1 Illustrative calculation for Example^ 



Step 2. Apply numerically stable ASR filter 
WJ20 

e 4 /8 

6» 3 /6 



6» 4 /8 e :i /Q 
6» 3 /3 9 2 /2 

e 2 /2 e 



W/3 
2 /2 

1 



2a 



2b. 



Given A ■■ 



compute R - 



-2.8875 





-3.8788 
-0.2576 




-3.0476 
-0.6954 
0.0797 



-3.3247 
0.8886 
0.5179 



1.6000 2.0000 1.3333 
2.0000 2.6667 2.0000 
1.3333 2.0000 2.0000 



where Q ■ 



-0.5541 
0.5795 
0.5976 



2.6667 
2.0000 
1.0000 

-0.6926 
0.0773 
-0.7171 



-0.4618 
-0.8113 
0.3586 



Step 3. Computation of the derivatives of the ASR filter (p=l) 



J 4 / 4 9 A /2 W/2 
q'a /2 a 2 a 



4 4 2 


4 


4 4 2 


2 


2 2 1 






3a. 



3b. 



3c. 



3d, 3c. 



Given AL 



e 2 /2 



i 



Compute QA' g . Denote X\ 



So, A'g 

-5.9105 
1.0045 
0.2390 



6=2 

-5.9105 
1.0045 
0.2390 



-2.9552 
0.5022 
0.1195 



Ni 



-3.6017 
2.4725 
0.9562 



Find XiR" 1 



2.0469 -7.8778 -27.5511 
-0.3479 1.3388 4.6822 
-0.0828 0.3186 1.1143 



Dx 



2.0469 
1.3388 
1.1143 



Ui 



Split it into L\ = 

-7.8778 -27.5511 
4.6822 






-0.3479 
-0.0828 0.3186 



Hence, the derivative of the post-array is R'„ 



-5.9105 -5.8209 -2.7199 
-0.3448 -0.5325 
0.0888 



-3.9537 
1.4810 
0.3978 



Accuracy of the computations: 



(A T A) r .~ - (R T R)' e __ 



2.17 ■ 10 



-14 



Now, we apply the computational scheme presented in Section 4.1 to com- 
pute the post-array R and its derivative (at the point 6 — 2). We note that 
R is block upper triangular. The obtained results are summarized in Ta- 
ble 1. As follows from ([T5| . we have A T A = R T R. Thus, the derivatives 
of both sides of the latter formula must also agree. We compute the norm 



(A T A)' e=2 - (R T R)' e=: 



2.01 • 10 



-14 



This confirms the correctness of 



the calculation and validates the theoretical derivations of the present paper. 
The lower triangular case can be justified at the same way. 



6.2 Application to the Maximum Likelihood Estimation 

In this section, we consider a set of simple simulation examples that illustrate 
the advantages of the suggested ASR based technique applied to the maximum 
likelihood estimation of unknown system parameters. To observe the difference 
between the conventional KF approach and the ASR based approach, we con- 
sider the following ill-conditioned test problem. 

Example 2 Consider the state-space model CQ)-© w ith {F,G, H, IIq,Q, R} 
given by 

F = h,G = 0,R= I 2 5 2 9 2 ,n Q = I 3 9 2 ,H = 



1 1 1 

111 + 5 
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where 9 is an unknown system parameter, that needs to be estimated. To sim- 
ulate roundoff we assume that S 2 < e 10 undoff, but 6 > e 10 undoff where e roU ndoff 
denotes the unit roundoff errojf]. 

In this set of numerical experiments we need to estimate the unknown 
system parameter 9 on the set of the ill-conditioned test problems, i.e. for 
different values of the ill-conditioning parameter 5. It is well known that the 
filtering problem of finding x-iy, i.e. the linear least square estimate of Xi given 
{z\ 1 . . . , Zj}, will be well posed if and only if the covar iance matrix of the 
stacked output {zi, . . . , Zj} is strictly positive definite; see lKailath et a i (teoooh 



for more detailes. This condition holds when R > and may hold even if the 
matrix R > 0. The numerical example above demonstrates how a problem that 
is well-conditioned, as posed, can be made ill-conditioned by the adaptive filter 
implementation. We may note that R > for any 9^0. However it is not 
difficult to see that for any fixed value of system parameter 9^0, the matrices 
R e:1 = R+Hn H T and (R e ,i)' e are ill-conditioned in machine precision, i.e. as 
8 — > froundoff- This leads to a failure of the conventional KF approach and, as 
a result, destroys the entire computational scheme. So, we plan to observe and 
compare performance of the "conventional KF technique" and our new "ASR 
based approach" in such a situation. Also, we have to stress that Example [2] 
represents numerical difficulties only for the covariance-type implementations. 
As a result, the SRIF based method can not be us ed in our comparative study 
becau se of its information- type. This follows from lVerhaegen and Van Poorer] 



(1986) where numerical insights of various KF implementations are analyzed 
and discussed at large. 

In order to judge the quality of each above-mentioned computational tech- 
nique, we conduct the following set of numerical experiments. Given the "true" 
value of the parameter 9, say 9* = 10, the system is simulated for 1000 sam- 
ples for various values of the problem conditioning parameter 5. Then, we use 
the generated data to solve the inverse problem, i.e. to compute the maxi- 
mum likelihood estimates by the above-mentioned three different approaches. 
The same gradient-based optimization method with the same initial value of 
— l is applied in all estimators. More precisely, we implement the standard 
MATLAB built-in function fminunc. This optimization function utilizes the 
negative Log LF and its gradient that are calculated by both techniques. All 
methods are run in MATLAB with the same precision (64-bit floating point). 
We performed 100 Monte Carlo simulations and report the posterior mean for 
9, the root mean squared error (RMSE) and the mean absolute percentage 
error (MAPE). 

Having carefully analyzed the obtained results presented in Tabled we con- 
clude the following. When 8 = 10~ 2 (it corresponds to the well-conditioned 
situation), all the considered approaches produce exactly the same result. Be- 
sides, the posterior means from all the estimators are all close to the "true" 



2 Computer roundoff for floating-point arithmetic is often characterized by a single 
parameter e roun dofti defined in different sources as the largest number such that either 
1 + ^roundoff = 1 or 1 + e roun doff/2 = 1 in machine precision. 
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Table 2 Effect of roundoff errors on the estimated 9* MLE in ill-conditioned test problems 



8 


exact answer 
(9* 


conventional KF technique 


ASR based approach 


Mean 


RMSE 


MAPE 


Mean 


RMSE 


MAPE 


icr 2 


10 


9.9932 


0.1761 


1.4186 


9.9932 


0.1761 


1.4186 


icr 3 


10 


10.0292 


0.1642 


1.2759 


9.9943 


0.1616 


1.2838 


icr 4 


10 


10.6205 


0.7307 


6.3493 


9.9932 


0.1761 


1.4185 


10" 5 


10 


10.5518 


0.6902 


5.8654 


9.9932 


0.1761 


1.4185 


10" 6 


10 


6.6430 


5.9554 


40.4298 


9.9922 


0.1605 


1.2663 


io- 7 


10 


-0.6601 


12.5707 


109.7138 


9.9825 


0.1721 


1.4057 



value, which is equal to 10. The RMSE and MAPE are equally small. Hence, 
we conclude that both methods work similarly well in the well-conditioned 
situations. Again, it validates the theoretical derivations of the present paper. 

We learnt from the previous set of numerical examples that both meth- 
ods, i.e. the conventional KF approach and the proposed ASR based scheme, 
produce exactly the same results in the well-conditioned situations, see the 
rows with 5 — 10~ 2 , 10 -3 in Tabled However, as S — > e roun( joff, which corre- 
sponds to growing ill-conditioning, these two techniques no longer agree. We 
observe that the conventional KF approach degrades rapidly as S — > e r oundoff- 
Already for 5 — 10~ 4 , the RMSE of the classical technique (i.e. the conven- 
tional KF based approach) is w 4 times greater than those of the ASR based 
method. The MAPE of the conventional KF approach is « 4.5 times greater 
than the MAPE from the ASR based technique (« 6.35% against m 1.42%). 
For 8 < 10~ 6 , the output generated by the classical technique is of no sense 
because of huge errors. More precisely, the RMSE of the conventional KF ap- 
proach for 6 < 1CT 6 is » 5.96 and the MAPE is « 40.43%. On the other hand, 
the proposed ASR based computational scheme works robustly, i.e. with small 
errors, until S = 10~ 7 , inclusively. For instance, the RMSE of the ASR based 
approach for S < 10~ 7 is « 0.17 and the MAPE is « 1.41%. 

In conclusion, the conventional KF approach performs markedly worse 
compared to the proposed ASR based scheme. This creates a strong argu- 
ment for practical application of the latter technique. 



7 Conclusion 

In this paper, we developed an elegant and simple method that extends func- 
tionality of any array square-root Kalman filtering algorithm in order to com- 
pute the derivatives of the filter variables to unknown system parameters re- 
quired in a variety of applications. For instance, our results can be used in 
gradient-search optimization algorithms for the maximum likelihood estima- 
tion of unknown system parameters. We showed that the new technique leads 
to a stable score and FIM evaluation. Furthermore, the methods are ideal for 
simultaneous state estimation and parameter identification since all values are 
computed in parallel. Finally, the new approach replaces the standard tech- 
nique based on direct differentiation of the Kalman filtering equations (with 
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their inherent numerical instability) and, hence, improves the robustness of 
computations against roundoff errors. The set of numerical experiments ex- 
hibits the markedly worse performance of the conventional Kalman filter ap- 
proach compared to the proposed array square-root based scheme. This creates 
a strong argument for practical application of the latter technique. 
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Appendix 

For the readers' convenience, we present the MatLab codes correspo nding to 



the m ethods developed in Section 4. We follow the rational idea of iHigham 



(|2001l ). The author remarks that "the best way to learn is by example .." 
and "MatLab is an ideal environment for this type of treatment..". As in 
the cited paper, we hope that the comment lines in the programs will make 
the listings comprehensible to all readers who have some experience with a 
scientific programming language. The codes will be also downloadable from 
the personal webpage of the first author. Finally, we may note that applica- 
tion of this numerically stable ASR based approach to the gradient evalua- 
tion of the auxiliary quality functional in the parametric identi fication prob- 
lem for stochastic systems can be found in Tsyganova! (|201lh . Application 



to the m aximum likelihood est i matio n of stochastic volatility models is pre- 



sented in iKulikova and Tavlor ( 20131) . The corresponding MatLab codes are 
also downloadable. 

7, Dif f _QR_upper One iteration of array Kalman filtering algorithms 

°/ based on QR decomposition (where R is upper triangular) 

% and its extension on the derivative computation 

7. 
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'/, Given the pre-array A=[A11 A12;A21 A22] , compute the post-array R=[R11 0; R12 R22] 
7. Given the derivatives of the pre-array Dif f _pre_array , 

'/, compute the derivatives of the post-array (Lemma 2 from the paper) 

function [Post_array ,Dif f _R11 ,Dif f _R12] = Dif f _QR_upper (Pre_array ,Dif f _pre_array , 1 , s ,k) ; 

[Orthog,Post_arrayl] = qr (Pre_array (1 : s+k, 1 : s) ) ; '/, Q is any orthogonal rotation that 

% upper-triangular izes the first 
'/, (s+k) by (s) block of the pre-array 

if 1~=0 

Post_array2 = Orthog' *Pre_array(l : s+k, s+1 : s+1) ; '/„ compute R12, R22 blocks 
Post_array = [Post_arrayl Post_array2] ; '/, the full post-array 

end; 

Rll = Post_array (1 : s, 1 : s) ; '/, blocks of the post-array 

R12 = Post_array(l:s,s+l:s+l); 
R22 = Post_array(s+l:s+k,s+l:s+l) ; 

Q_applied_Dif f A = 0rthog'*Dif f _pre_array; '/, apply Q to the pre-array derivatives 

XX = Q_applied_Dif f A(l : s , 1 : s) ; '/, notations 

YY = Q_applied_DiffA(s+l:s+k,l:s) ; 

NN = Q_applied_DiffA(l:s,s+l:s+l) ; 

VV = Q_applied_DiffA(s+l:s+k,s+l:s+l) ; 

matrix_product = XX/R11; "/, compute the matrix product 

'/, Note, MatLab treats correctly the empty blocks 

Diff_Rll = (tril(matrix_product ,-1) '+triu(matrix_product) ) *R11 ; 
Diff_R12 = (tril(matrix_product,-l) ' -tril (matrix_product ,-1) ) *R12+ ... 
Rll'\ YY'*R22+NN; 



Listing 1 M-file Dif f _QR_upper .m. 



'/, Dif f _QR_lower One iteration of array Kalman filtering algorithms 

'/, based on QR decomposition (where R is lower triangular) 

% and its extension on the derivative computations 

"/. 

•/, Given the pre-array A=[A11 A12;A21 A22] , compute the post-array R=[0 R21; R12 R22] 
"/, Given the derivatives of the pre-array Dif f _pre_array , 

"/, compute the derivatives of the post-array (Lemma 1 from the paper) 

function [Post_array,Dif f _R11 ,Dif f _R12] = Dif f _QR_lower (Pre_array ,Dif f _pre_array , 1 , s ,k) ; 

[Orthog, Post_arrayl] = ... 

qr_lower (Pre_array (1 : s+k, 1 : s)) ; '/, Q is any orthogonal rotation that 

'/, lower-triangular izes the first 
'/, (s+k) by (s) block of the pre-array 

if 1~=0 

Post_array2 = Orthog' *Pre_array(l : s+k, s+1 : s+1) ; '/, compute R12, R22 blocks 
Post_array = [Post_arrayl Post_array2] ; '/, the full post-array 

end; 



R21 = Post_array(k+l:k+s,l:s) ; 
R12 = Post_array(l:k,s+l:s+l) ; 



'/, blocks of the post-array 
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R22 = Post_array(k+l:s+k,s+l:s+l) ; 

Q_applied_Dif f A = Orthog'*Diff _pre_array; 7. apply Q to the pre-array derivatives 

XX = Q_applied_DiffA(l:k,l:s); 7. notations 

YY = Q_applied_DiffA(k+l:s+k,l:s) ; 

NN = Q_applied_DiffA(l:k,s+l:s+l) ; 

VV = Q_applied_DiffA(k+l:s+k,s+l:s+l) ; 

matrix_product = YY/R21; 7. compute the matrix product 

7, Note, MatLab treats correctly the empty blocks 

Diff_R21 = (tril(matrix_product)+triu(matrix_product , 1) ' ) *R21 ; 
Diff_R22 = (triu(matrix_product , 1) ' -triu(matrix_product , 1) ) *R22+ ... 
R21'\ XX'*R12+VV; 



Listing 2 M-file Dif f _QR_lower .m. 



